Skip to content

fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling#10994

Merged
sestinj merged 2 commits intomainfrom
nate/fix-cache-hit-rate-telemetry
Mar 4, 2026
Merged

fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling#10994
sestinj merged 2 commits intomainfrom
nate/fix-cache-hit-rate-telemetry

Conversation

@sestinj
Copy link
Contributor

@sestinj sestinj commented Mar 3, 2026

Summary

  • Fix cache_hit_rate telemetry: The prompt_cache_metrics event was emitted twice per completion, and the cache hit rate denominator used only prompt_tokens (which maps to Anthropic's input_tokens — non-cached only). This caused ratios >> 1 when caching worked well (max observed: 89,892). Fixed by removing the duplicate emission and using the correct total: prompt_tokens + cache_read_tokens + cache_write_tokens.

  • Fix Vercel AI SDK tool call streaming: The Vercel AI SDK streams tool calls as tool-input-starttool-input-deltatool-input-endtool-call. Previously tool-input-start was ignored and tool-call emitted the full call at the end, so streaming consumers never saw the tool call id on intermediate chunks. Now tool-input-start emits the initial chunk with id and function name (matching OpenAI's streaming format), and tool-call is a no-op to avoid duplicating args.

Test plan

  • Unit tests updated and passing for vercelStreamConverter.test.ts (15 tests)
  • Vercel SDK integration tests should now pass in CI (locally blocked by missing @ai-sdk/xai dep, env-only issue)

Two bugs in prompt_cache_metrics telemetry:

1. Duplicate emission: prompt_cache_metrics was emitted twice per API
   request — once using `actualInputTokens` and again using
   `fullUsage.prompt_tokens`. This doubled all event counts in PostHog
   and produced conflicting values.

2. Wrong denominator: cache_hit_rate was calculated as
   `cacheReadTokens / prompt_tokens`, but the Anthropic adapter maps
   `prompt_tokens` to only non-cached input tokens (`input_tokens`),
   excluding cache reads and writes. When caching works well, this
   produces ratios >> 1 (observed max: 89,892). The correct total is
   `prompt_tokens + cache_read_tokens + cache_write_tokens`.

Fix: remove the first duplicate emission and compute total_prompt_tokens
as the sum of all three token types. cache_hit_rate is now a proper 0-1
ratio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sestinj sestinj requested a review from a team as a code owner March 3, 2026 05:01
@sestinj sestinj requested review from RomneyDa and removed request for a team March 3, 2026 05:01
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 3, 2026
@continue
Copy link
Contributor

continue bot commented Mar 3, 2026

Docs Review: No documentation updates needed.

This PR contains internal telemetry fixes (correcting cache_hit_rate calculation and removing duplicate event emission) that don't affect user-facing features, configuration options, or developer workflows. The changes are purely internal to Continue's analytics infrastructure.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

…nverter

The Vercel AI SDK streams tool calls as tool-input-start → tool-input-delta
→ tool-input-end → tool-call. Previously, tool-input-start was ignored (returned
null) and tool-call emitted the full tool call at the end, which meant streaming
consumers never saw the tool call id on intermediate chunks.

Now tool-input-start emits the initial chunk with id and function name (matching
OpenAI's streaming format), and tool-call returns null to avoid duplicating args
already streamed via tool-input-delta.

Generated with [Continue](https://continue.dev)

Co-Authored-By: Continue <noreply@continue.dev>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 3, 2026
@sestinj sestinj changed the title fix: correct cache_hit_rate calculation and remove duplicate emission fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling Mar 3, 2026
@sestinj sestinj merged commit ec7030d into main Mar 4, 2026
59 of 60 checks passed
@sestinj sestinj deleted the nate/fix-cache-hit-rate-telemetry branch March 4, 2026 15:57
@github-project-automation github-project-automation bot moved this from Todo to Done in Issues and PRs Mar 4, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant